Fast Trigonometric Functions Using Intel’s Sse2 Instructions
نویسندگان
چکیده
The goal of this work was to answer one simple question: given that the trigonometric functions take hundreds of clock cycles to execute on a Pentium IV, can they be computed faster, especially given that all Intel processors now have fast floating-point hardware? The streaming SIMD extensions (SSE/SSE2) in every Pentium III and IV provide both scalar and vector modes of computation, so it has been our goal to use the vector hardware to compute the cosine and other trigonometric functions. The cosine function was chosen, as it has significant use in our research as well as in image construction with the discrete cosine transform.
منابع مشابه
BLAKE and 256-bit advanced vector extensions
Intel recently documented its AVX2 instruction set extension that introduces support for 256-bit wide single-instruction multiple-data (SIMD) integer arithmetic over double (32-bit) and quad (64-bit) words. This will enable Intel’s future processors—starting with the Haswell architecture, to be released in 2013—to fully support 4-way SIMD com putation of 64-bit ARX algorithms (32-bit is alread...
متن کاملSSE Implementation of Multivariate PKCs on Modern x86 CPUs
Multivariate Public Key Cryptosystems (MPKCs) are often touted as future-proofing against Quantum Computers. It also has been known for efficiency compared to “traditional” alternatives. However, this advantage seems to erode with the increase of arithmetic resources in modern CPUs and improved algorithms, especially with respect to Elliptic Curve Cryptography (ECC). In this paper, we show that...
متن کاملData - Level and Thread - Level Parallelism in Emerging
Multimedia applications are becoming increasingly important for a large class of general-purpose processors. Contemporary media applications are highly complex and demand high performance. A distinctive feature of these applications is that they have significant parallelism, including thread-, data-, and instruction-level parallelism, that is potentially well-aligned with the increasing paralle...
متن کاملFast Implementation of RC6 Using Intel's SSE2 Instructions*
RC6 is a symmentric block cipher, designed by RSA laboratory to meet the requirements of the AES competition. As one of the five AES finalists, RC6 achieves good performance with a high level of security, and especially fit for parallel processing. SSE2 is a set of Intel's instruction extensions in the IA-32's SIMD programming model. It provides the ability to perform SIMD operations on 128-bit...
متن کاملFast Realization of Digital Elevation Model
We propose an optimization approach to speed up the point matching process underlying the 3D reconstruction of complex urban scenes. We consider the Optical Flow technique for point matching and propose to introduce MMX and SSE2 instructions to accelerate significantly the matching process. Fast point matching allows using sub-pixel image resolution, which provides a more accurate estimation of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003